max rank | avg. rank | sentence |
---|---|---|
132 | 38.3750 | Qua epocha 2011 per dies circa solem movebatur. |
279 | 56.7500 | Qua epocha 2007 per dies circa solem movebatur. |
304 | 59.8750 | Qua epocha 2014 per dies circa solem movebatur. |
341 | 64.5000 | Qua epocha 2005 per dies circa solem movebatur. |
350 | 144.4286 | Anno 2008 - 2012 senator huius factionis erat. |
381 | 133.2857 | In regione etiam vicus eiusdem nominis est. |
433 | 76.0000 | Qua epocha 15 per dies circa solem movebatur. |
480 | 157.2000 | Urbs est caput provinciae, et in media regione condita est. |
493 | 208.6250 | Hoc munere ad usque annum 1990 operam dedit. |
509 | 85.5000 | Qua epocha 16 per dies circa solem movebatur. |
510 | 85.6250 | Qua epocha 9 per dies circa solem movebatur. |
530 | 88.1250 | Qua epocha 22 per dies circa solem movebatur. |
531 | 88.2500 | Qua epocha 14 per dies circa solem movebatur. |
544 | 89.8750 | Qua epocha 11 per dies circa solem movebatur. |
546 | 90.1250 | Qua epocha 13 per dies circa solem movebatur. |
565 | 92.5000 | Qua epocha 1998 per dies circa solem movebatur. |
566 | 92.6250 | Qua epocha 2002 per dies circa solem movebatur. |
587 | 95.2500 | Qua epocha 19 per dies circa solem movebatur. |
588 | 232.7143 | Ab anno 1946 urbs Saxoniae Inferioris est. |
588 | 266.0000 | Anno 1946 pars Saxoniae Inferioris factum erat. |
588 | 237.4286 | Anno 1946 sodalis factionis SPD factus est. |
588 | 209.7778 | Quod autem ab anno 1946 pars Saxoniae Inferioris est. |
600 | 96.8750 | Qua epocha 24 per dies circa solem movebatur. |
609 | 236.2500 | Postea usque ad mortem senator ad vitam mansit. |
631 | 235.3000 | Annis 2004 - 2012 etiam praeses suae factionis Berolini operam dedit. |
632 | 164.8571 | Anno 2012 etiam senator Romaniae electus est. |
667 | 105.2500 | Qua epocha 1945 per dies circa solem movebatur. |
670 | 324.2857 | Ab anno 1998 professor Universitatis eiusdem erat. |
680 | 106.8750 | Qua epocha 1974 per dies circa solem movebatur. |
702 | 195.8750 | Cum pater mortuus est, octo annos natus erat. |
The maximum word rank of a sentence is by definition the rank of the rarest word in the sentence. If it is low, all words in the sentence are of high frequency. For this reason the table of the sentences with least maximum word number might be of interest. In the table, we see the corresponding sentences with a minimum length of 40 characters.
The over all distribution of the maximum rank in all sentences of the corpus is shown in a diagram with log-scaled x-axis.
The sentences in the table described above are of interest because they are usually easy to understand. The distribution may give insights into the corpus and may give parameters for language comparison.
While the distribution might be deduced from a small corpus, the sentences in the table are rare and a large corpus will give more impressive results.
Table data:
select max(w_id)-100 as m, avg(w_id)-100 as a, s.sentence from sentences s, inv_w i where s.s_id=i.s_id and length(sentence)>40 and i.w_id>100 group by s.s_id order by m limit 30;
Distribution data;
select m, count(*) from (select 100* round((max(w_id)-100)/100) as m from sentences s, inv_w i where s.s_id=i.s_id and i.w_id>100 group by s.s_id) aa group by m;
Explain the distribution, especially the increase in its right part.
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences consisting of long words only II